f 



(1) GENERAL INFORMATION: 
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SEQUENCE LISTING 



(i) APPLICANT: 



Co^acci, Antonello 
Bugnoli , Massimo 
TelfixDrd, John 
Maccni-a, Giovanni 
RappuQli , Rino 



(ii) TITLE OF INVENTION] 
for Vaccines and 

(iii) NUMBER OF SEQUENCES: 



Helicobacter Pylor^ 
diagnostics 



Proteins Useful 



(iv) CORRESPONDENCE ADDRESS; 

(A) ADDRESSEE: Chiron Corpo^taon 

(B) STREET: 4 560 Horton StreVt 

(C) CITY: Emeryville 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94608-2916 



(v) COMPUTER READABLE FOI 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM RC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS -DO^ 

(D) SOFTWARE: Patyfentln Release #1.0\ Version #1,25 



^=3 (vi) CURRENT APPLICATION DATA: 

^■•^ (A) APPLICATION NUMBER: 08/471,491 

(B) FILING DATE: 06-JUNE-1995 

(C) CLASSIFJCATION: 

(viii) ATTORNEY/AOENT INFORMATION: 

(A) NAME: McGflung, Barbara G, 

(B) REGISTR^ION NUMBER: 33,113 

(C) REFERE^E /DOCKET NUMBER: 0316.003 

. (ix) TELEOZ^-IMUNI CATION INFORMATION: 
(A)/tELEPHONE: (510) 601-2708 
(b/ TELEFAX: (510) 655-3542 



(2) INFOiy^ATION FOR SEQ ID NO : 1 : 

(i/ SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
GCAAGCTTAT CGATGTCGAC TCGAGCT 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3960 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

p (ii) MOLECULE TYPE: DNA (genomic) 

\n 

^'r^ (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 



A|^fAAGAAAG 


GAAGAAAATG 


GAAATACAAC 


AAACACACCG 


CAAAATCAAT 


CGCCCTCTGG 


60 


TTTCTCTCGC 


TTTAGTAGGA 


GCATTAGTCA 


GCATCACACC 


GCAACAAAGT 


CATGCCGCCT 


120 


TftTCACAAC 


CGTGATCATT 


CCAGCCATTG 


TTGGGGGTAT 


CGCTACAGGC 


ACCGCTGTAG 


180 


gMcggtctc 

rn 


AGGGCTTCTT 


AGCTGGGGGC 


TCAAACAAGC 


CGAAGAAGCC 


AATAAAACCC 


240 


CABATAAACC 


CGATAAAGTT 


TGGCGCATTC 


AAGCAGGAAA 


AGGCTTTAAT 


GAATTCCCTA 


300 


acaaggaata 


CGACTTATAC 


AGATCCCTTT 


TATCCAGTAA 


GATTGATGGA 


GGTTGGGATT 


360 


gggggaatgc 


CGCTAGGCAT 


TATTGGGTCA 


AAGGCGGGCA 


ACAGAATAAG 


CTTGAAGTGG 


420 


ATATGAAAGA 


CGCTGTAGGG 


ACTTATACCT 


TATCAGGGCT 


TAGAAACTTT 


ACTGGTGGGG 


480 


ATTTAGATGT 


CAATATGCAA 


AAAGCCACTT 


TACGCTTGGG 


CCAATTCAAT 


GGCAATTCTT 


540 


TTACAAGCTA 


TAAGGATAGT 


GCTGATCGCA 


CCACGAGAGT. 


GGATTTCAAC 


GCTAAAAATA 


600 


TCTCAATTGA 


TAATTTTGTA 


GAAATCAACA 


ATCGTGTGGG 


TTCTGGAGCC 


GGGAGGAAAG 


660 


CCAGCTCTAC 


GGTTTTGACT 


TTGCAAGCTT 


CAGAAGGGAT 


CACTAGCGAT 


AAAAACGCTG 


720 


AAATTTCTCT 


TTATGATGGT 


GCCACGCTCA 


ATTTGGCTTC 


AAGCAGCGTT 


AAATTAATGG 


780 


GTAATGTGTG 


GATGGGCCGT 


TTGCAATACG 


TGGGAGCGTA 


TTTGGCCCCT 


TCATACAGCA 


840 


CGATAAACAC 


TTCAAAAGTA 


ACAGGGGAAG 


TGAATTTTAA 


CCACCTCACT 


GTTGGCGATA 


900 
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AAAACGCCGC TCAAGCQGGC ATTATCGCTA ATAAAAAGAC TAATATTGGC ACACTGGATT 960 

TGTGGCAAAG CGCCGGGTTA AACATTATCG CTCCTCCAGA AGGTGGCTAT AAGGATAAAC 102 0 

CCAATAATAC CCCTTCTCAA AGTGGTGCTA AAAACGACAA AAATGAAAGC GCTAAAAACG 108 0 

ACAAACAAGA GAGCAGTCAA AATAATAGTA ACACTCAGGT CATTAACCCA CCCAATAGTG 114 0 

CGCAAAAAAC AGAAGTTCAA CCCACGCAAG TCATTGATGG GCCTTTTGCG GGCGGCAAAG 1200 

ACACGGTTGT CAATATCAAC CGCATCAACA CTAACGCTGA TGGCACGATT AGAGTGGGAG 1260 

GGTTTAAAGC TTCTCTTACC ACCAATGCGG CTCATTTGCA TATCGGCAAA GGCGGTGTCA 1320 

ATCTGTCCAA TCAAGCGAGC GGGCGCTCTC TTATAGTGGA AAATCTAACT GGGAATATCA 13 8 0 

C(|GTTGATGG GCCTTTAAGA GTGAATAATC AAGTGGGTGG CTATGCTTTG GCAGGATCAA 144 0 

G(j:§CGAATTT TGAGTTTAAG GCTGGTACGG ATACCAAAAA CGGCACAGCC ACTTTTAATA 1500 

AQQATATTAG TCTGGGAAGA TTTGTGAATT TAAAGGTGGA TGCTCATACA GCTAATTTTA 1560 

A;j:gGTATTGA TACGGGTAAT GGTGGTTTCA ACACCTTAGA TTTTAGTGGC GTTACAGACA 1620 

m ■ 

AAGTCAATAT CAACAAGCTC ATTACGGCTT CCACTAATGT GGCCGTTAAA AACTTCAACA 1680 

T'f^ATGAATT GATTGTTAAA ACCAATGGGA TAAGTGTGGG GGAATATACT CATTTTAGCG 1740 

A^GATATAGG CAGTCAATCG CGCATCAATA CCGTGCGTTT GGAAACTGGC ACTAGGTCAC 18 00 

TT:|tCTCTGG GGGTGTTAAA TTTAAAGGTG GCGAAAAATT GGTTATAGAT GAGTTTTACT 18 60 

ATAGCCCTTG GAATTATTTT GACGCTAGAA ATATTAAAAA TGTTGAAATC ACCAATAAAC 1920 

TTGCTTTTGG ACCTCAAGGA AGTCCTTGGG GCACATCAAA ACTTATGTTC AATAATCTAA 1980 

CCCTAGGTCA AAATGCGGTC ATGGATTATA GCCAATTTTC AAATTTAACC ATTCAAGGGG 204 0 

ATTTCATCAA CAATCAAGGC ACTATCAACT ATCTGGTCCG AGGTGGGAAA GTGGCAACCT 2100 

TAAGCGTAGG CAATGCAGCA GCTA-rGATGT TTAATAATGA TATAGACAGC GCGACCGGAT 2160 

TTTACAAACC GCTCATCAAG ATTAACAGCG CTCAAGATCT CATTAAAAAT ACAGAACATG 2220 

TTTTATTGAA AGCGAAAATC ATTGGTTATG GTAATGTTTC TACAGGTACC AATGGCATTA 2280 

GTAATGTTAA TCTAGAAGAG CAATTCAAAG AGCGCCTAGC CCTTTATAAC AACAATAACC 234 0 
GCATGGATAC TTGTGTGGTG CGAAATACTG ATGACATTAA AGCATGCGGT ATGGCTATCG ' 2400 

GCGATCAAAG CATGGTGAAC AACCCTGACA ATTACAAGTA TCTTATCGGT AAGGCATGGA 24 60 
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AAAATATAGG 


GATCAGCAAA 


. ACAGCTAATG 


GCTCTAAAAT 


TTCGGTGTAT 


TATTTAGGCA 


2520 


ATTCTACGCC 


TACTGAGAAT 


GGTGGCAATA 


CCACAAATTT 


ACCCACAAAC 


ACCACTAGCA 


2580 


ATGCACGTTC 


TGCCAACAAC 


GCCCTTGCAC 


AAAACGCTCC 


TTTCGCTCAA 


CCTAGTGCTA 


2640 


CTCCTAATTT 


AGTCGCTATC 


AATCAGCATG 


ATTTTGGCAC 


TATTGAAAGC 


GTGTTTGAAT 


2700 


TGGCTAACCG 


CTCTAAAGAT 


ATTGACACGC 


TTTATGCTAA 


CTCAGGCGCT 


CAAGGCAGGG 


2760 


ATCTCTTACA 


AACCTTATTG 


ATTGATAGCC 


ATGATGCGGG 


TTATGCCAGA 


AAAATGATTG 


2820 


ATGCTACAAG 


CGCTAATGAA 


ATCACCAAGC 


AATTGAATAC 


GGCCACTACC 


ACTTTAAACA 


2880 


ACATAGCCAG 


TTTAGAGCAT 


AAAACCAGCG 


GCTTACAAAC 


TTTGAGCTTG 


AGTAATGCGA 


2940 


T(fmTTTTAAA 


TTCTCGTTTA 


GTCAATCTCT 


CCAGGAGACA 


CACCAACCAT 


ATTGACTCGT 


3000 


TfP^CCAAACG 


CTTACAAGCT 


TTAAAAGACC 


AAAAATTCGC 


TTCTTTAGAA 


AGCGCGGCAG 


3060 


AA^TGTTGTA TCAATTTGCC 


CCTAAATATG 


AAAAACCTAC 


CAATGTTTGG 


GCTAACGCTA 


3120 


f 

TTfGGGGAAC 


GAGCTTGAAT 


AATGGCTCTA 


ACGCTTCATT 


GTATGGCACA 


AGCGCGGGCG 


3180 


Ti^SACGCTTA 


CCTTAACGGG 


CAAGTGGAAG 


CCATTGTGGG 


CGGTTTTGGA 


AGCTATGGTT 


3240 


AT^GCTCTTT 


TAATAATCGT 


GCGAACTCCC 


TTAACTCTGG 


GGCCAATAAC 


ACTAATTTTG 


3300 


GC^TGTATAG 

pi 


CCGTATTTTT 


GCCAACCAGC 


ATGAATTTGA 


CTTTGAAGCT 


CAAGGGGCAC 


3360 


TAf GGAGCGA TCAATCAAGC 

i. J 


TTGAATTTCA 


AAAGCGCTCT 


ATTACAAGAT 


TTGAATCAAA 


3420 


GCTATCATTA 


CTTAGCCTAT 


AGCGCTGCAA 


CAAGAGCGAG 


CTATGGTTAT 


GACTTCGCGT 


3480 


TTTTTAGGAA 


CGCTTTAGTG 


TTAAAACCAA 


GCGTGGGTGT 


GAGCTATAAC 


CATTTAGGTT 


3540 


CAACCAACTT 


TAAAAGCAAC 


AGCACCAATC 


AAGTGGCTTT 


GAAAAATGGC 


TCTAGCAGTC 


3600 


AGCATTTATT 


CAACGCTAGC 


GCTAATGTGG 


AAGCGCGCTA 


TTATTATGGG 


GACACTTCAT 


3660 


ACTTCTACAT 


GAATGCTGGA 


GTTTTACAAG 


AGTTCGCTCA- 


TGTTGGCTCT 


AATAACGCCG 


3720 


CGTCTTTAAA 


CACCTTTAAA 


GTGAATGCCG 


CTCGCAACCC 


TTTAAATACC 


CATGCCAGAG 


3780 


TGATGATGGG 


TGGGGAATTA 


AAATTAGCTA 


AAGAAGTGTT 


TTTGAATTTG 


GGCGTTGTTT 


3840 


ATTTGCACAA 


TTTGATTTCC 


AATATAGGCC 


ATTTCGCTTC 


CAATTTAGGA 


ATGAGGTATA 


3900 


GTTTCTAAAT 


ACCGCTCTTA . 


AACCCATGCT 


CAAAGCATGG 


GTTTGAAATC 


TTACAAAACA 


3960 



(2) INFORMATION FOR SEQ ID NO : 3 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 9 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY:, linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Met Glu lie Gin Gin Thr His Arg Lys lie Asn Arg Pro Leu Val Ser 

1 5 . 10 . 15 

Leu Ala Leu Val Gly Ala Leu Val Ser lie Thr Pro Gin Gin Ser His 
20 , .25 30 

Ala Ala Phe Phe Thr Thr Val lie lie Pro Ala lie Val Gly Gly lie 

35 40 45 

- Ala Thr Gly Thr Ala Val Gly Thr Val Ser Gly Leu Leu Ser Trp Gly 
50 55 60 

Leu Lys Gin Ala Glu Glu Ala Asn Lys Thr Pro Asp Lys Pro Asp Lys 
65 70 75 * 80 

Val Trp Arg lie Gin Ala Gly Lys Gly Phe Asn Glu Phe Pro Asn Lys 

85 90 95 

Glu Tyr Asp Leu Tyr Arg Ser Leu Leu Ser Ser Lys lie Asp Gly Gly 
100 105 110 

Trp Asp Trp Gly Asn Ala Ala Arg His Tyr Trp Val Lys Gly Gly Gin 
115 120 125 

Gin Asn Lys Leu Glu Val Asp Met Lys Asp Ala Val Gly Thr Tyr Thr 

130 135 140 

Leu Ser Gly Leu Arg Asn Phe Thr Gly Gly Asp Leu Asp Val Asn Met 
145 150 155 160 

Gin Lys Ala Thr Leu Arg Leu Gly Gin Phe Asn Gly Asn Ser Phe Thr 

165 170 175 

Ser Tyr Lys Asp Ser Ala Asp Arg Thr Thr Arg Val Asp Phe Asn Ala 
180 185 190 

Lys Asn lie Ser lie Asp Asn Phe Val Glu lie Asn Asn Arg Val Gly 
195 200 205 

Ser Gly Ala Gly Arg Lys Ala Ser Ser Thr Val Leu Thr Leu Gin Ala 
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210 • 215 220 

Ser Glu Gly lie Thr Ser Asp Lys Asn Ala Glu lie Ser Leu Tyr Asp 
225 230 235 240 

Gly Ala Thr Leu Asn Leu Ala Ser Ser Ser Val Lys Leu Met Gly Asn 

245 * 250 255 

Val Trp Met Gly Arg Leu Gin Tyr Val Gly Ala Tyr Leu Ala Pro Ser 
260 265 270 

Tyr Ser Thr lie Asn Thr Ser Lys Val Thr Gly Glu Val Asn Phe Asn 
275 280 285 

His Leu Thr Val Gly Asp Lys Asn Ala Ala Gin Ala Gly' lie lie Ala 
290 _ 295 300 

Q Asn Lys Lys Thr Asn lie Gly Thr Leu Asp Leu Trp Gin Ser Ala Gly 
■ill 305 310 315 320 

12 Leu Asn lie lie Ala Pro Pro Glu Gly Gly Tyr Lys Asp Lys Pro Asn 

In 325 330 ^ 335 

In Asn Thr Pro Ser Gin Ser Gly Ala Lys Asn Asp Lys Asn Glu Ser Ala 

340 345 - 350 

|;i Lys Asn Asp Lys Gin Glu Ser Ser Gin Asn Asn Ser Asn Thr Gin Val 

.„t 355 360 365 

^l: He Asn Pro Pro Asn Ser Ala Gin Lys Thr Glu Val Gin Pro- Thr Gin 
'^3 370 375 380 



Val He Asp Gly Pro Phe Ala Gly Gly Lys Asp Thr Val Val Asn He 
385 390 395 400 

Asn Arg He Asn Thr Asn Ala Asp Gly Thr He Arg Val Gly Gly Phe 

405 410 415 

Lys Ala Ser Leu Thr Thr Asn Ala Ala His Leu His He Gly Lys Gly 
420 425 430 

Gly Val Asn Leu Ser Asn Gin Ala Ser Gly Arg Ser Leu He Val Glu 
435 440 445 

Asn Leu Thr Gly Asn He Thr Val Asp Gly Pro Leu Arg Val Asn Asn 
450 455 460 

Gin Val Gly Gly Tyr Ala Leu Ala Gly Ser Ser Ala Asn Phe Glu Phe 

470 475 480 

Lys Ala Gly Thr Asp Thr Lys Asn Gly Thr Ala Thr Phe Asn Asn Asp 

485 490 495 



• # 
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lie Ser Leu Gly Arg Phe Val Asn Leu Lys Val Asp Ala His Thr Ala 
500 505 510 

Asn Phe Lys Gly lie Asp Thr Gly Asn Gly Gly Phe Asn Thr Leu Asp 
515 520 525 

Phe Ser Gly Val Thr-^Afep Lys Val Asn lie Asn Lys Leu lie Thr Ala 
530 535 ^ 540 

Ser Thr Asn Val Ala Val Lys Asn Phe Asn lie Asn Glu Leu lie Val 
545 550 555 560 

Lys Thr Asn Gly lie Ser Val Gly Glu Tyr Thr His Phe Ser Glu Asp 

565 570 575 

lie Gly Ser Gin Ser Arg lie Asn Thr Val Arg Leu Glu Thr Gly Thr 
580 585 590 

Arg Ser Leu Phe Ser Gly Gly Vai Lys Phe Lys Gly Gly Glu Lys Leu 

595 600 605 

Val lie Asp Glu Phe Tyr Tyr Ser Pro Trp Asn Tyr Phe Asp Ala Arg 
610 615 620 

Asn lie Lys Asn Val Glu lie Thr Asn Lys Leu Ala 'Phe Gly Pro Gin 
625 630 . 635 ' 640 

Gly Ser Pro Trp Gly Thr Ser Lys Leu Met Phe Asn Asn Leu Thr Leu 

645 650 655 

Gly Gin Asn Ala Val Met Asp Tyr Ser Gin Phe Ser Asn Leu Thr lie 
660 665 670 

Gin Gly Asp Phe lie Asn Asn Gin Gly Thr lie Asn Tyr Leu Val Arg 
675 680 685 

Gly Gly Lys Val Ala Thr Leu Ser Val Gly Asn Ala Ala Ala Met Met 

690 695 700 

Phe Asn Asn Asp He Asp Ser Ala Thr Gly Phe Tyr Lys Pro Leu He 
705 710 715 720 

Lys He Asn Ser Ala Gin Asp Leu He Lys Asn Thr Glu His Val Leu 

72 5 73 0 73 5 

Leu Lys Ala Lys He He Gly Tyr Gly Asn Val Ser Thr Gly Thr Asn 
740 745 750 

Gly He Ser Asn Val Asn Leu Glu Glu Gin Phe Lys Glu Arg Leu Ala 

755 760 765 

Leu Tyr Asn Asn Asn Asn Arg Met Asp Thr Cys Val Val Arg Asn Thr 
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770 775 780 

Asp Asp lie Lys Ala Cys Gly Met Ala lie Gly Asp Gin Ser Met Val 
785 790 795 800 

Asn Asn Pro Asp Asn Tyr Lys Tyr Leu lie Gly Lys Ala Trp Lys Asn 

805 * 810 815 

lie Gly lie Ser Lys Thr Ala Asn Gly Ser Lys lie Ser Val Tyr Tyr 
820 825 830 

Leu Gly Asn Ser Thr Pro Thr Glu Asn Gly Gly Asn Thr Thr Asn Leu 
835 840 845 

Pro Thr Asn Thr Thr Ser Asn Ala Arg Ser Ala Asn Asri Ala Leu Ala 
850 . 855 860 

Gin Asn Ala Pro Phe Ala Gin Pro Ser Ala Thr Pro Asn Leu Val Ala 
865 870 ' 875 880 

lie Asn Gin His Asp Phe Gly Thr lie Glu Ser Val Phe Glu Leu Ala 

885 890 895 

Asn Arg Ser Lys Asp lie Asp Thr Leu Tyr Ala Asn Ser Gly Ala Gin 
900 905 : 910 

Gly Arg Asp Leu Leu Gin Thr Leu Leu lie Asp Ser His Asp Ala Gly 
915 920 925 

Tyr Ala Arg Lys Met lie Asp Ala Thr Ser Ala Asn Glu lie Thr Lys 
930 935 940 

Gin Leu Asn Thr Ala Thr Thr Thr Leu Asn Asn lie Ala Ser^ Leu Glu 
345 950 955 960 

His Lys Thr Ser Gly Leu Gin Thr Leu Ser Leu Ser Asn Ala Met lie 

965 970 . 975 

Leu Asn Ser Arg Leu Val Asn Leu Ser Arg Arg His Thr Asn His lie 
980 985 990 . 

Asp Ser Phe Ala Lys Arg Leu Gin Ala Leu Lys Asp Gin Lys Phe Ala 
995 1000 1005 

Ser Leu Glu Ser Ala Ala Glu Val Leu Tyr Gin Phe Ala Pro Lys Tyr 
1010 1015 1020 

Glu Lys Pro Thr Asn Val Trp Ala Asn Ala lie Gly Gly Thr Ser Leu 
1025 1030 1035 1040 

Asn Asn Gly Ser Asn Ala Ser Leu Tyr Gly Thr Ser Ala Gly Val Asp 

1045 1050 1055 
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Ala Tyr Leu Asn Gly Gin Val Glu Ala lie Val Gly Gly Phe Gly Ser 
1060 1065 ■ 1070 

Tyr Gly Tyr Ser Ser Phe Asn Asn Arg Ala Asn Ser Leu Asn Ser Gly 
1075 1080 1085 

Ala Asn Asn Thr Asn Phe Gly Val Tyr Ser Arg lie Phe Ala Asn Gin 
1090 1095 . 1100 

His Glu Phe Asp Phe Glu Ala Gin Gly Ala Leu Gly Ser Asp Gin Ser 
1105 1110 1115 1120 

Ser Leu Asn Phe Lys Ser Ala Leu Leu Gin Asp Leu Asn Gin Ser Tyr 

1125 1130 1135 

His Tyr Leu Ala Tyr Ser Ala Ala Thr Arg Ala Ser Tyr Gly Tyr Asp 
□ 1140 1145 1150 

j J Phe Ala Phe Phe Arg Asn Ala Leu Val Leu Lys Pro Ser Val Gly Val 
p 1155 1160 1165 

!=S Ser Tyr Asn His Leu Gly Ser Thr Asn Phe Lys Ser Asn Ser Thr Asn 
Ja 1170 1175 1180 

Gin Val Ala Leu Lys Asn Gly Ser Ser Ser Gin His .Leu Phe Asn Ala 
1185. 1190 1195 1200 

Ser Ala Asn Val Glu Ala Arg Tyr Tyr Tyr Gly Asp Thr Ser Tyr Phe 

1205 1210 1215 

Tyr Met Asn Ala Gly Val Leu Gin Glu Phe Ala His Val Gly Ser Asn 
1220 1225 1230 

Asn Ala Ala Ser Leu Asn Thr Phe Lys Val Asn Ala Ala Arg Asn Pro 
1235 1240 1245 

Leu Asn Thr His Ala Arg Val Met Met Gly Gly Glu Leu Lys Leu Ala 

1250 • 1255 1260 

Lys Glu Val Phe Leu Asn Leu Gly Val Val Tyr Leu His Asn Leu lie 
1265 1270 ,;• 1275 1280 

Ser Asn lie Gly His Phe Ala Ser Asn Leu Gly Met Arg Tyr Ser Phe 

1285 1290 1295 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5925 base pairs . 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 



^.1 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 



CTCCATTTTA 


AGCAACTCCA 


TAGACCACTA 


AAGAAACTTT TTTTGAGGCT 


ATCTTTGAAA 


60 


ATCTGTCCTA 


TTGATTTGTT 


TTCCATTTTG 


TTTCCCATGT GGATCTTGTG 


GATCACAAAC 


120 


GCTTAATTAT 


ACATGCTATA 


GTAAGCATGA 


CACACAAACC AAACTATTTT 


TAGAACGCTT 


180 


CATGTGCTCA 


CCTTGACTAA 


CCATTTCTCC 


AACCATACTT TAGCGTTGCA 


TTTGATTTCT 


240 


TOIAAAAAGAT 


TCATTTCTTA 


TTTCTTGTTC 


TTATTAAAGT TCTTTCATTT 


TAGCAAATTT 


300 


TfGTTAATTG 


TGGGTAAAAA 


TGTGAATCGT 


CCTAGCCTTT AGACGCCTGC 


AACGATCGGG 


360 


(If TTTTTCAA TATTAATAAT GATTAATGAA AAAAAAAAAA AATGCTTGAT ATTGTTGTAT 


420 


i^TGAGAATG 


TTCAAAGACA 


TGAATTGACT 


ACTCAAGCGT GTAGCGATTT 


TTAGCAGTCT 


480 


t^GACACTAA CAAGATACCG ATAGGTATGA AACTAGGTAT AGTAAGGAGA AACAATGACT 


540 


XSCGAAACCA 


TTGACCAACA 


ACCACAAACC 


GAAGCGGCTT TTAACCCGCA 


GCAATTTATC 


600 


J^TAATCTTC AAGTAGCTTT TCTTAAAGTT GATAACGCTG TCGCTTCATA 

'■J ? 


CGATCCTGAT 


660 


Ci^AAAACCAA TCGTTGATAA GAACGATAGG 


GATAACAGGC AAGCTTTTGA 


AGGAATCTCG 


720 


CAATTAAGGG 


AAGAATACTC 


CAATAAAGCG 


ATCAAAAATC CTACCAAAAA 


GAATCAGTAT 


780 


TTTTCAGACT 


TTATCAATAA 


GAGCAATGAT 


TTAATCAACA AAGACAATCT 


CATTGATGTA 


840 


GAATCTTCCA 


CAAAGAGCTT 


TCAGAAATTT 


GGGGATCAGC GTTACCGAAT 


TTTCACAAGT 


- 900 


TGGGTGTCCC 


ATCAAAACGA 


TCCGTCTAAA 


ATCAACACCC GATCGATCCG 


AAATTTTATG 


960 


GAAAATATCA 


TACAACCCCC 


TATCCTTGAT 


GATAAAGAGA AAGCGGAGTT 


TTTGAAATCT 


1020 


GCCAAACAAT 


CTTTTGCAGG 


AATCATTATA 


GGGAATCAAA TCCGAACGGA 


TCAAAAGTTC 


1080 


ATGGGCGTGT 


TTGATGAGTC 


CTTGAAAGAA 


AGGCAAGAAG CAGAAAAAAA 


TGGAGAGCCT 


.1140 


ACTGGTGGGG 


ATTGGTTGGA 


TATTTTTCTC 


TCATTTATAT TTGACAAAAA 


ACAATCTTCT 


1200 


GATGTCAAAG 


AAGCAATCAA 


TCAAGAACCA 


GTTCCCCATG TCCAACCAGA 


TATAGCCACT 


1260 


ACCACCACCG 


ACATACAAGG 


CTTACCGCCT 


GAAGCTAGAG ATTTACTTGA 


TGAAAGGGGT 


1320 
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AATTTTTCTA 


AATTCACTCT 


TGGCGATATG 


GAAATGTTAG 


ATGTTGAGGG 


AGTCGCTGAC 


1380 


ATTGATCCCA 


ATTACAAGTT 


CAATCAATTA 


TTGATTCACA 


ATAACGCTCT 


GTCTTCTGTG 


1440 


TTAATGGGGA 


GTCATAATGG 


CATAGAACCT 


GAAAAAGTTT 


CATTGTTGTA 


TGGGGGCAAT 


1500 


GGTGGTCCTG 


GAGCTAGGCA 


TGATTGGAAC 


GCCACCGTTG 


GTTATAAAGA 


CCAACAAGGC 


1560 


AACAATGTGG 


CTACAATAAT 


TAATGTGCAT 


ATGAAAAACG 


GCAGTGGCTT 


AGTCATAGCA 


1620 


GGTGGTGAGA 


AAGGGATTAA 


CAACCCTAGT 


TTTTATCTCT 


ACAAAGAAGA 


CCAACTCACA 


1680 


GGCTCACAAC 


GAGCATTAAG 


TCAAGAAGAG 


ATCCAAAACA 


AAATAGATTT 


CATGGAATTT 


1740 


CTTGCACAAA 


ATAATGCTAA 


ATTAGACAAC 


TTGAGCGAGA 


AAGAGAAGGA 


AAAATTCCGA 


1800 


ACTGAGATTA 


AAGATTTCCA 


AAAAGACTCT 


AAGGCTTATT 


TAGACGCCCT 


AGGGAATGAT 


1860 


CCpJATTGCTT 


TTGTTTCTAA 


AAAAGACACA 


AAACATTCAG 


CTTTAATTAC 


TGAGTTTGGT 


1920 


AAfGGGGATT 


TGAGCTACAC 


TCTCAAAGAT 


TATGGGAAAA 


AAGCAGATAA 


AGCTTTAGAT 


1980 


AGGGAGAAAA ATGTTACTCT TCAAGGTAGC CTAAAACATG ATGGCGTGAT GTTTGTTGAT 


2040 


tAttctaatt 


TCAAATACAC 


CAACGCCTCC 


AAGAATCCCA 


ATAAGGGTGT 


AGGCGTTACG 


2100 


AilfGGCGTTT 


CCCATTTAGA 


AGTAGGCTTT 


AACAAGGTAG 


CTATCTTTAA 


TTTGCCTGAT 


2160 


tI^aataatc 


TCGCTATCAC 


TAGTTTCGTA 


AGGCGGAATT 


TAGAGGATAA 


ACTAACCACT 


2220 


aaSggattgt 


CCCCACAAGA 


AGCTAATAAG 


CTTATCAAAG 


ATTTTTTGAG 


CAGCAACAAA 


2280 


gaattggttg 


GAAAAACTTT 


AAACTTCAAT 


AAAGCTGTAG 


CTGACGCTAA 


AAAa\CAGGC 


2340 


aattatgatg 


AAGTGAAAAA 


AGCTCAGAAA 


GATCTTGAAA 


AATCTCTAAG 


GAAACGAGAG 


2400 


catttagaga. 


AAGAAGTAGA 


GAAAAAATTG 


GAGAGCAAAA 


GCGGCAACAA 


AAATAAAATG 


2460 


GAAGCAAAAG 


CTCAAGCTAA 


CAGCCAAAAA 


GATGAGATTT 


TTGCGTTGAT 


CAATAAAGAG 


2520 


GCTAATAGAG 


ACGCAAGAGC 


AATCGCTTAC 


GCTCAGAATC;- 


TTAAAGGCAT 


CAAAAGGGAA 


2580 


TTGTCTGATA 


AACTTGAAAA 


TGTCAACAAG 


AATTTGAAAG 


ACTTTGATAA 


ATCTTTTGAT 


2640 


GAATTCAAAA 


ATGGCAAAAA 


TAAGGATTTC 


AGCAAGGCAG 


AAGAAACACT 


AAAAGCCCTT 


2700 


AAAGGTTCGG 


TGAAAGATTT 


AGGTATCAAT 


CCAGAATGGA 


TTTCAAAAGT 


TGAAAACCTT 


2760 


AATGCAGCTT 


TGAATGAATT 


CAAAAATGGG 


AAAAATAAGG 


ATTTCAGCAA 


GGTAACGCAA 


2820 


GCAAAAAGCG 


ACCTTGAAAA 


TTCCGTTAAA 


GATGTGATCA 


TCAATCAAAA 


GGTAACGGAT 


2880 



AAAGTTGATA 


ATCTCAATCA 


AGCGGTATCA 


GTGGCTAAAG 


CAACGGGTGA 


TTTCAGTAGG 


2940 


GTAGAGCAAG 


CGTTAGCCGA 


TCTCAAAAAT 


TTCTCAAAGG 


AGCAATTGGC 


CCAACAAGCT 


3000 


CAAAAAAATG 


AAAGTCTCAA 


TGCTAGAAAA 


AAATCTGAAA 


TATATCAATC 


CGTTAAGAAT 


3060 


GGTGTGAATG 


GAACCCTAGT 


cggtAatggg 


TTATCTCAAG 


CAGAAGCCAC 


AACTCTTTCT 


3120 


AAAAACTTTT 


CGGACATCAA 


GAAAGAGTTG 


AATGCAAAAC 


TTGGAAATTT 


CAATAACAAT 


3180 


AACAATAATG 


GACTCAAAAA 


CGAACCCATT 


TATGCTAAAG 


TTAATAAAAA 


GAAAGCAGGG 


3240 


CAAGCAGCTA 


GCCTTGAAGA 


ACCCATTTAC 


GCTCAAGTTG 


CTAAAAAGGT 


AAATGCAAAA 


3300 


ATTGACCGAC 


TCAATCAAAT 


AGCAAGTGGT 


TTGGGTGTTG 


' TAGGGCAAGC 


AGCGGGCTTC 


3360 


qep?TTGAAAA 


GGCATGATAA 


AGTTGATGAT 


CTCAGTAAGG 


TAGGGCTTTC 


AAGGAATCAA 


3420 


GA^TTGGCTC AGAAAATTGA 


CAATCTCAAT 


CAAGCGGTAT 


CAGAAGCTAA 


AGGAGGTTTT 


3480 


I^ITGGCAATC 


TAGAGCAAAC 


GATAGACAAG 


CTCAAAGATT 


CTACAAAACA 


CAATCCCATG 


3540 


m 

Aj?^CTATGGG 


TTGAAAGTGC 


AAAAT^AAGTA 


CCTGCTAGTT 


TGTCAGCGAA 


ACTAGACAAT 


3600 


TAbGCTACTA 


ACAGCCACAT 


ACGCATTAAT 


AGCAATATCA 


AAAATGGAGC 


AATCAATGAA 


3660 


AkkGCGACCG GCATGCTAAC 


GCAAAAAAAC 


CCTGAGTGGC 


TCAAGCTCGT 


GAATGATAAG 


3720 


AtfAGTTGCGC ATAATGTAGG 

m 


AAGCGTTCCT 


TTGTCAGAGT 


ATGATAAAAT 


TGGCTTCAAC 


3780 


CA^AAGAATA TGAAAGATTA 


TTCTGATTCG 


TTCAAGTTTT 


CCACCAAGTT 


GAACAATGCT 


3840 


GTAAAAGACA 


CTAATTCTGG 


CTTTACGCAA 


TTTTTAACCA 


ATGCATTTTC 


TACAGCATCT 


3900 


TATTACTGCT 


TGGCGAGAGA 


AAATGCGGAG 


CATGGAATCA 


AGAACGTTAA 


TACAAAAGGT 


3960 


GGTTTCCAAA 


AATCTTAAAG 


GATTAAGGAA 


TACCAAAAAC 


GCAAAAACCA 


CCCCTTGCTA 


4020 


AAAGCGAGGG 


GTTTTTTAAT 


ACTCCTTAGC 


AGAAATCCCA 


ATCGTCTTTA 


GTATTTGGGA 


4080 


TGAATGCTAC 


CAATTCATGG 


TATCATATCC 


CCATACATTC 


GTATCTAGCG 


TAGGAAGTGT 


4140 


GCAAAGTTAC 


GCCTTTGGAG 


ATATGATGTG 


TGAGACCTGT 


AGGGAATGCG 


TTGGAGCTCA 


4200 


AACTCTGTAA 


AATCCCTATT 


ATAGGGACAC 


AGAGTGAGAA 


CCAAACTCTC 


CCTACGGGCA 


4260 


ACATCAGCCT 


AGGAAGCCCA 


ATCGTCTTTA 


GCGGTTGGGC 


ACTTCACCTT 


AAAATATCCC 


4320 


GACAGACACT 


AACGAAAGGC 


TTTGTTCTTT 


AAAGTCTGCA 


TGGATATTTC 


CTACCCCAAA 


4380 


AAGACTTAAC 


CCTTTGCTTA 


AAATTAAGTT 


TGATTGTGCT 


AGTGGGTTCG 


TGCTATAGTG 


4440 
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CGAAAATTAA 


TTAAGGGTTA 


TAAAGAGAGC 


ATAAACTAGA 


AAAAACAAGT 


AGCTATAACA 


4500 


AAGATCAAGT 


TCAAAAAATC 


ATAGAGCTTT 


TAGAGCAAAT 


TGATCGCGCT 


CTTAACCAAA 


45G0 


GAAAAATCAG 


AAAAACCATA 


GGAATTATCA 


CACCTTATAA 


TGCCCAAAAA 


AGACGCTTGC 


4620 


GATCAGAAGT 


GGAAAAATAC 


GGCTtCAAGA 


ATTTTGATGA 


GCTCAAAATA 


GACACTGTGG 


4680 


ATGCCTTTCA 


AGGTGAAGAG 


GCAGATATTA 


TTATTTATTC 


CACCGTGAAA 


ACTTGTGGTA 


4740 


ATCTTTCTTT 


CTTGCTAGAT 


TCTAAACGCT 


TGAATGTGGC 


TATTTCTAGG 


GCAAAAGAAA 


4800 


ATCTCATTTT 


TGTGGGTAAA 


AAGTCTTTCT 


TTGAGAATTT 


ATGAAGCGAT 


GAGAAGAATA 


4860 


TCTTTAGCGC 


TATTTTGCAA 


GTCTGTAGAT 


AGGTAATCTT 


TTCCAAAGAT 


AATCATTAGA 


4920 


CAITCTTCGC 


TTCAAAACGC 


TTTCATAAAT 


CTCTCTAAAG 


CGCTTTATAA 


TCAACACAAT 


. 4980 


AQ^gCTTATAG 


TGTGAGCTAT 


AGCCCCTTTT 


TGGGAATTGA 


GTTATTTTGA 


CTTTAAATTT 


5040 


m 

TTIATTAGCGT 


TACAATTTGA 


GCCATTCTTT 


AGCTTGTTTT 


TCTAGCCAGA 


TCACATCGCC 


5100 


m 

GGT^CGCATGA 


AATTCCACTT 


TAGGGAATGC 


GTGTGCATTT 


TTTTTAAGGG 


CGTATTTTTG 


5160 


Cl||i?CAAATAT 


CCTACAATAG 


CATCGCCCGA 


ATGGATGAGT 


AGGGGGGGTG 


TTGAAAGGGC 


5220 


A;6^TGCTCC ATAAAATAGC CCTCAATTTT TTGAGCGATT AAGGGAAAAT GCGTGCAACC 


5280 


t;^^3|vataatc acttcgggaa aatctttaag ggagtgaaat aataacgcat gcaagtttct 


5340 


AA^ci^TTCGC 


CCTCTAAAAT 


ACTTTCTTCA 


ATCAAAGGCA 


CAAAAAGAGA 


AGTGGCTAAA 


5400 


TGCGAAACAT 


TCAAATAGCC 


TTGTTGTTTC 


AGGGCATTGT 


CATAAGCGTT 


GGATTGGATC 


5460 


GTCGCTTTTG 


TCCCTAGCAC 


TAAAATAGGG 


GCGTTTTTAT 


CTTTTACTTG 


TCGCTTGATC 


5520 


GCTAAAATGC 


TTGGCTCAAT 


CACGCCCACA 


ATAGGGATTT 


TGGAATGCTT 


TTGCATCTCT 


5580 


TCTAAAGCTA 


GAGCGCTCGC 


TGTGTTGCAT 


GCCACAATCA 


ATAATTCAAT 


CTGGTGCGGT 


5640 


TTGAAAAAAT 


CCAAAGCCTC 


TAAGCCAAAT 


tgcttgatcg;- 


TAGTGGGGTC 


TTTAGTGCCA 


5700 


TAAGGCACTC 


TAGCCGTATC 


GCCATAATAG 


ATGATTTCAT 


CAAATAATTG 


CGCTTTTAAA 


5760 


AGGCTTTTTA 


AAACGCTAAA 


CCCTCCCACA 


CCGCTATCAA 


AAACGCCTAT 


TTTCATGACA 


5820 


CTTTTTTAAT 


TTAATGGGAT 


TAATTAGGGA 


TTTTATTTTT 


CATTCATTAA 


GTTTAAAAAT 


5880 


TCTTCATTGT 


CCTTAGTTTG 


TTGCATTTTA 


GAATAGACAA 


AGCTT 




5925 


(2) INFORMATION FOR SEQ ID NO : 5 : 











m • 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1147 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

Met Thr Asn Glu Thr lie Asp Gin Gin Pro Gin Thr Glu Ala Ala Phe 
15 10 /. 15 

Asn Pro Gin Gin Phe lie Asn Asn Leu Gin Val Ala Phe Leu Lys Val 

3 20 - 25 30 

3 

ul Asp Asn Ala Val Ala Ser Tyr Asp Pro Asp Gin Lys Pro lie Val Asp 

ri 35 40 . 45 

3 

n= Lys Asn Asp Arg Asp Asn Arg Gin Ala Phe Glu Gly lie Ser Gin Leu 

□50 55 60 

Arg Glu Glu Tyr Ser Asn Lys Ala lie Lys Asn Pro /Thr Lys Lys Asn 
-,65 70 75 80 



j/l Gin Tyr Phe Ser Asp Phe lie Asn Lys Ser Asn Asp Leu lie Asn Lys 

Ly 85 90 95 



= f Asp Asn Leu lie Asp Val Glu Ser Ser Thr Lys Ser Phe Gin* Lys Phe 

100 105 110 

Gly Asp Gin Arg Tyr Arg lie Phe Thr Ser Trp Val Ser His Gin Asn 
115 120 125 

Asp Pro Ser Lys lie Asn Thr Arg Ser lie Arg Asn Phe Met Glu Asn 
130 135 140 

lie lie Gin Pro Pro lie Leu Asp Asp Lys Glu Lys Ala Glu Phe Leu 
145 150 155 160 

Lys Ser Ala Lys Gin Ser Phe Ala Gly lie lie lie Gly Asn Gin lie 

165 170 175 

Arg Thr Asp Gin Lys Phe Met Gly Val Phe Asp Glu Ser Leu Lys Glu 
180 185 190 

Arg Gin Glu Ala Glu Lys Asn Gly Glu Pro Thr Gly Gly Asp Trp Leu 
195 200 205 

Asp lie Phe Leu Ser Phe lie Phe Asp Lys Lys Gin Ser Ser Asp Val 
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210 . 215 220 

Lys Glu Ala lie Asn Gin Glu Pro Val Pro His Val Gin Pro Asp lie 
225 230 235 240 

Ala Thr Thr Thr Thr Asp lie Gin Gly Leu Pro Pro Glu Ala Arg Asp 

245 ' 250 255 

Leu Leu Asp Glu Arg Gly Asn Phe Ser 'Lys Phe Thr Leu Gly Asp Met 
260 265 270 

Glu Met Leu Asp Val Glu Gly Val Ala Asp lie Asp Pro Asn Tyr Lys 
275 280 285 

Phe Asn Gin Leu Leu lie His Asn Asn Ala Leu Ser Ser Val Leu Met 
290 295 300 

Gly Ser His Asn Gly lie Glu Pro Glu Lys Val Ser Leu Leu Tyr Gly 
305 310 315 320 

Gly Asn Gly Gly Pro Gly Ala. Arg His Asp Trp Asn Ala Thr Val Gly 

325 330 335 

Tyr Lys Asp Gin Gin Gly Asn Asn Val Ala Thr lie lie Asn Val His 
340 345 / 350 

Met Lys Asn Gly Ser Gly Leu Val lie Ala Gly Gly Glu Lys Gly lie 
355 360 365 

Asn Asn Pro Ser Phe Tyr Leu Tyr Lys Glu Asp Gin Leu Thr Gly Ser 
3 70 3 75 3 80 

Gin Arg Ala Leu Ser Gin Glu Glu lie Gin Asn Lys lie Asp Phe Met 
385 390 395 400 

Glu Phe Leu Ala Gin Asn Asn Ala Lys Leu Asp Asn Leu Ser Glu Lys 

405 410 415 

Glu Lys Glu Lys Phe Arg Thr Glu lie Lys Asp Phe Gin Lys Asp Ser 
420 425 430 

Lys Ala Tyr Leu Asp Ala Leu Gly Asn Asp Arg lie Ala Phe Val Ser 
435 ,440 445 

Lys Lys Asp Thr Lys His Ser Ala Leu lie Thr Glu Phe Gly Asn Gly 
450 455 460 

Asp Leu Ser Tyr Thr Leu Lys Asp Tyr Gly Lys Lys Ala Asp Lys Ala 
465 470 475 480 

Leu Asp Arg Glu Lys Asn Val Thr Leu Gin Gly Ser Leu Lys His Asp 

485 490 495 



Gly Val Met Phe Val 
500 

Lys Asn Pro Asn Lys 
515 

Glu Val Gly Phe Asn 
530 

Asn Leu Ala lie Thr 
545 

Thr Thr Lys Gly Leu 

565 

Phe Leu Ser Ser Asn 
P 580 

i\I Lys Ala Val Ala Asp 

m 595. 

P 

i;n Lys Ala Gin Lys Asp 



Glu Lys Glu Val Glu 

h 625 

Lys Met Glu Ala Lys 



Ala Leu lie Asn Lys 
660 

Ala Gin Asn Leu Lys 
675 

Asn Val Asn Lys Asn 
690 

Lys Asn Gly Lys Asn 
705 

Ala Leu Lys Gly Ser 

725 

Ser Lys Val Glu Asn 
740 

Lys Asn Lys Asp Phe 
755 

Asn Ser Val Lys Asp 
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Asp Tyr Ser Asn Phe 
505 

Gly Val Gly Val Thr 
520 

Lys Val Ala lie Phe 
535 

Ser Phe Val Arg Arg 
550 

Ser Pro Gin Glu Ala 

570 

Lys Glu Leu Val Gly 
585 

Ala Lys Asn Thr Gly 
600 

Leu Glu Lys Ser Leu 
615 

Lys Lys Leu Glu Ser 
630 

Ala Gin Ala Asn Ser 

650 

Glu Ala Asn Arg Asp 
665 

Gly lie Lys Arg Glu 
680 

Leu Lys Asp Phe Asp 
695 

Lys Asp Phe Ser Lys 
710 

Val Lys Asp Leu Gly 

730 

Leu Asn Ala Ala Leu 
745 

Ser Lys Val Thr Gin 
760 

Val lie lie Asn Gin 



Lys Tyr Thr Asn Ala Ser 
510 

Asn Gly Val Ser His Leu 
525 

Asn Leu Pro Asp Leu Asn 
540 

Asn Leu Glu Asp Lys Leu 
555 560 

Asn Lys Leu lie Lys Asp 

575 

Lys Thr Leu Asn Phe Asn 
590 

Asn Tyr Asp Glu Val Lys 

605 

Arg Lys Arg Glu His Leu 
620 

Lys Ser /Gly Asn Lys Asn 
635 640 

Gin Lys Asp Glu lie Phe 

655 

Ala Arg Ala lie Ala Tyr 
670 

Leu Ser Asp Lys ' Leu Glu 
685 

Lys Ser Phe Asp Glu Phe 
700 

Ala Glu Glu Thr Leu Lys 
715 720 

lie Asn Pro Glu Trp lie 

735 

Asn Glu Phe Lys Asn Gly 
750 

Ala Lys Ser Asp Leu Glu 
765 

Lys Val Thr Asp Lys Val 
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770 775 780 

Asp Asn Leu Asn Gin Ala Val Ser Val Ala Lys Ala Thr Gly Asp Phe 
785 790 795 800 

Ser Arg Val Glu Gin Ala Leu Ala Asp Leu Lys Asn Phe Ser Lys Glu 

805 * 810 815 

Gin Leu Ala Gin Gin Ala Gin Lys Asn 'Glu Ser Leu Asn Ala Arg Lys 
820 825 830 

Lys Ser Glu lie Tyr Gin Ser Val Lys Asn Gly Val Asn Gly Thr Leu 
835 840 845 

Val Gly Asn Gly Leu Ser Gin Ala Glu Ala Thr Thr Leu Ser Lys Asn 
850 - 855 860 

Phe Ser Asp lie Lys Lys Glu Leu Asn Ala Lys Leu Gly Asn Phe Asn 
865 870 875 880 

Asn Asn Asn Asn Asn Gly Leu Lys Asn Glu Pro lie Tyr Ala Lys Val 

885 890 895 

Asn Lys Lys Lys Ala Gly Gin Ala Ala Ser Leu Glu Glu Pro lie Tyr 
900 905 : 910 

Ala Gin Val Ala Lys Lys Val Asn Ala Lys lie Asp Arg Leu Asn Gin 
915 920 925 

lie Ala Ser Gly Leu Gly Val Val Gly Gin Ala Ala Gly Phe Pro Leu 
930 935 940 

Lys Arg His Asp Lys Val Asp Asp Leu Ser Lys Val Gly Leu Ser Arg 
945 950 955 960 

Asn Gin Glu Leu Ala Gin Lys lie Asp Asn Leu Asn Gin Ala Val Ser 

965 970 975 

Glu Ala Lys Ala Gly Phe Phe Gly Asn Leu Glu Gin Thr lie Asp Lys 
980 985 990 

Leu Lys Asp Ser Thr Lys His Asn Pro Met Asn Leu Trp Val Glu Ser 
995 1000 1005 

Ala Lys Lys Val Pro Ala Ser Leu Ser Ala Lys Leu Asp Asn Tyr Ala 
1010 1015 1020 

Thr Asn Ser His lie Arg lie Asn Ser Asn lie Lys Asn Gly Ala lie 
1025 1030 1035 1040 

Asn Glu Lys Ala Thr Gly Met Leu Thr Gin Lys Asn Pro Glu Trp Leu 

1045 1050 1055 
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Lys Leu Val Asn Asp Lys lie Val Ala His Asn Val Gly Ser Val Pro 
1060 1065 1070 

Leu Ser Glu Tyr Asp Lys lie Gly Phe Asn Gin Lys Asn Met Lys Asp 
1075 1080 1085 

Tyr Ser Asp Ser Phe Lys Phe Ser Thr Lys Leu Asn Asn Ala Val Lys 
1090 1095 . 1100 

Asp Thr Asn Ser Gly Phe Thr Gin Phe Leu Thr Asn Ala Phe Ser Thr 
1105 1110 1115 1120 

Ala Ser Tyr Tyr Cys Leu Ala Arg Glu Asn Ala Glu His Gly lie Lys 

1125 1130 1135 

Asn Val Asn Thr Lys Gly Gly Phe Gin Lys Ser 
p 1140 1145 

(. 

(2j)j INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 
jiS (A) LENGTH: 54 6 amino acids 

(B) type: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



U (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Ala Lys Glu lie Lys Phe Ser Asp Ser Ala Arg Asn Leu Leu Phe 
1 5 10 15 

Glu Gly Val Arg Gin Leu His Asp Ala Val Lys Val Thr Met Gly Pro 
20 25 30 

Arg Gly Arg Asn Val Leu lie Gin Lys Ser Tyr Gly Ala Pro Ser lie 
35 40 45 

Thr Lys Asp Gly Val Ser Val Ala Lys Glu lie Glu Leu Ser Cys Pro 
50 55 60 

Val Ala Asn Met Gly Ala Gin Leu Val Lys Glu Val Ala Ser Lys Thr 
65 70 75 80 

Ala Asp Ala Ala Gly Asp Gly Thr . Thr Thr Ala Thr Val Leu Ala Tyr 

85 90 95 

Ser lie Phe Lys Glu Gly Leu Arg Asn lie Thr Ala Gly Ala Asn Pro 
100 105 110 
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lie Glu Val Lys Arg Gly Met Asp Lys Ala Ala Glu Ala lie lie Asn 
115 120 125 

Glu Leu Lys Lys Ala Ser Lys Lys Val Gly Gly Lys Glu Glu lie Thr 
130 135 140 

Gin Val Ala Thr lie Ser Ala Asn Ser Asp His Asn lie Gly Lys Leu 
145 150 , 155 160 

lie Ala Asp Ala Met Glu Lys Val Gly Lys Asp Gly Val lie Thr Val 

165 170 175 

Glu Glu Ala Lys Gly lie Glu Asp Glu Leu Asp Val Val Glu Gly Met 
180 185 190 

Gin Phe Asp Arg Gly Tyr Leu Ser Pro Tyr Phe Val Thr Asn Ala Glu 
195 200 205 

Lys Met Thr Ala Gin Leu Asp Asn Ala Tyr lie Leu Leu Thr Asp Lys 
210 215 220 

Lys lie Ser Ser Met Lys Asp lie Leu Pro Leu Leu Glu Lys Thr Met 
225 230 235 240 

Lys Glu Gly Lys Pro Leu Leu Ile- Ile Ala Glu Asp/ lie Glu Gly Glu 

245 250 255 

Ala Leu Thr Thr Leu Val Val Asn Lys Leu Arg Gly Val Leu Asn lie 
260 265 270 

Ala Ala Val Lys Ala Pro Gly Phe Gly Asp Arg Arg Lys Glii Met Leu 
275 280 285 

Lys Asp lie Ala lie Leu Thr Gly Gly Gin Val lie Ser Glu Glu Leu 
290 295 300 

Gly Leu Ser Leu Glu Asn Ala Glu Val Glu Phe Leu Gly Lys Ala Gly 
305 310 315 320 

Arg lie Val lie Asp Lys Asp Asn Thr Thr lie Val Asp Gly Lys Gly ' 

325 330 335 

His Ser Asp Asp Val Lys Asp Arg Val Ala Gin lie Lys Thr Gin lie 
340 345 350 

Ala Ser Thr Thr Ser Asp Tyr Asp Lys Glu Lys Leu Gin Glu Arg Leu 
355 360 365 

Ala Lys Leu Ser Gly Gly Val Ala Val lie Lys Val Gly Ala Ala Ser 
370 375 380 

Glu Val Glu Met Lys Glu Lys Lys Asp Arg Val Asp Asp Ala Leu Ser 



• * 



i i = 
i'-ri 



m 
m 
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385 390 395 400 

Ala Thr Lys Ala Ala Val Glu Glu Gly He Val He Gly Gly Gly Ala 

405 410 415 

Ala Leu He Arg Ala Ala Gin Lys Val His Leu Asn Leu His Asp Asp 
420 425 430 

Glu Lys Val Gly Tyr Glu He He Met Arg Ala He Lys Ala Pro Leu 
435 440 445 

Ala Gin He Ala He Asn Ala Gly Tyr Asp Gly Gly Val Val Val Asn 
450 455 460 

Glu Val Glu Lys His Glu Gly His Phe Gly Phe Asn Ala Ser Asn Gly 
465 470 475 480 

Lys Tyr Val Asp Met Phe Lys Glu Gly He He Asp Pro Leu Lys Val 

485 490 495 

Glu Arg He Ala Leu Gin Asn Ala Val Ser Val Ser Ser Leu Leu Leu 
500 505 510 

Thr Thr Glu Ala Thr Val His Glu He Lys Glu Glu Lys Ala Thr Pro 
515 520 /525 

Ala Met Pro Asp Met Gly Gly Met Gly Gly Met Gly Gly Met Gly Gly 
530 535 540 

Met Met 
Q 545 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 183 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDl^TESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

AAGCTTGCTG TCATGATCAC AAAAAACACT AAAAAACATT ATTATTAAGG ATACAAAATG 60 

GCAAAAGAAA TCAAATTTTC AGATAGTGCG AGAAACCTTT TATTTGAAGG CGTGAGGCAA 120 

CTCCATGACG CTGTCAAAGT AACCATGGGG CCAAGAGGCA GGAATGTATT GATCCAAAAA 18 0 
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AGCTATGGCG 


CTCCAAGCAT 


CACCAAAGAC 


GGCGTGAGCG 


TGGCTAAAGA 


GATTGAATTA 


240 


AGTTGCCCAG 


TAGCTAACAT 


GGGCGCTCAA 


CTCGTTAAAG 


AAGTAGCGAG 


CAAAACCGCT 


300 


GATGCTGCCG 


GCGATGGCAC 


GACCACAGCG 


ACCGTGCTAG 


CTTATAGCAT 


TTTTAAAGAA 


360 


GGTTTGAGGA 


ATATCACGGC 


TGGGGCTAAC 


CCTATTGAAG 


TGAAACGAGG 


CATGGATAAA 


420 


GCTGCTGAAG 


CGATCATTAA 


TGAGCTTAAA 


AAAGCGAGtA 


AAAAAGTAGG 


CGGTAAAGAA 


480 


GAAATCACCC 


AAGTGGCGAC 


CATTTCTGCA 


AACTCCGATC 


ACAATATCGG 


GAAACTCATC 


540 


GCTGACGCTA 


TGGAAAAAGT 


GGGTAAAGAC 


GGCGTGATCA 


CCGTTGAGGA 


AGCTAAGGGC 


600 


ATTGAAGATG 


AATTGGATGT. 


CGTAGAAGGC 


ATGCAATTTG 


ATAGAGGCTA 


CCTCTCCCCT 


660 


TilTTTTGTAA 


CGAACGCTGA 


GAAAATGACC 


GCTCAATTGG 


ATAATGCTTA 


CATCCTTTTA* 


720 


J^SGGATAAAA AAATCTCTAG 


CATGAAAGAC 


ATTCTCCCGC 


TACTAGAAAA 


AACCATGAAA 


780 


qSGGGCAAAC 


CGCTTTTAAT 


CATCGCTGAA 


GACATTGAGG 


GCGAAGCTTT 


AACGACTCTA 


840 


Cm^GTGAATA AATTAAGAGG 


CGTGTTGAAT 


ATCGCAGCGG 


TTAAAGCTCC 


AGGCTTTGGG 


900 


in 

q^CAGAAGAA AAGAAATGCT 


CAAAGACATC 


GCTATTTTAA 


CCGGCGGTCA 


AGTCATTAGC 


960 


GfiAGAATTGG 


GCTTGAGTCT 


AGAAAACGCT 


GAAGTGGAGT 


TTTTAGGCAA 


AGCTGGAAGG 


1020 


Af^GTGATTG ACAAAGACAA 


CACCACGATC 


GTAGATGGCA 


AAGGCCATAG 


CGATGATGTT 


1080 


AjfVGACAGAG 


TCGCGCAGAT 


CAAAACCCAA 


ATTGCAAGTA 


CGACAAGCGA 


TTATGACAAA 


1140 


GAAAAATTGC 


AAGAAAGATT 


GGCTAAACTC 


TCTGGCGGTG 


TGGCTGTGAT 


TAb?^GTGGGC 


1200 


GCTGCGAGTG 


AAGTGGAAAT 


GAAAGAGAAA 


AAAGACCGGG 


TGGATGACGC 


GTTGAGCGCG 


1260 


ACTAAAGCGG 


CGGTTGAAGA 


AGGCATTGTG 


ATTGGTGGCG 


GTGCGGCTCT 


CATTCGCGCG 


1320 


GCTCAAAAAG 


TGCATTTGAA 


TTTGCACGAT 


GATGAAAAAG 


TGGGCTATGA 


AATCATCATG 


1380 


CGCGCCATTA 


AAGCCCCATT 


AGCTCAAATC 


GCTATCAACG" 


CTGGTTATGA 


TGGCGGTGTG 


1440 


GTCGTGAATG 


AAGTAGAAAA 


ACACGAAGGG 


CATTTTGGTT 


TTAACGCTAG 


CAATGGCAAG 


1500 


TATGTGGATA 


TGTTTAAAGA 


AGGCATTATT 


GACCCCTTAA 


AAGTAGAAAG 


GATCGCTCTA 


1560 


CAAAATGCGG 


TTTCGGTTTC 


AAGCCTGCTT 


TTAACCACAG 


AAGCCACCGT 


GCATGAAATC 


1620 


AAAGAAGAAA 


AAGCGACTCC 


GGCAATGCCT 


GATATGGGTG 


GCATGGGCGG 


TATGGGAGGC 


1680 


ATGGGCGGCA 


TGATGTAAGC 


CCGCTTGCTT 


TTTAGTATAA 


TCTGCTTTTA 


AAATCCCTTC ^ 


1740 
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TCTAAATCCC CCCCTTTCTA AAATCTCTTT TTTGGGGGGG TGCTTTGATA AAA.CCGCTCG 18 00 

CTTGTAAAAA CATGCAACAA AAAATCTCTG TTAAGCTT 1838 
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